This worksheet leans heavily on tidyverse packages.
cdl.table contains relations between crops, their index number in the Cropland Data Layer, and the corresponding name and index number in the FAO Indicative Crop Classification (ICC 1.1).
roi.table contains relations from the regions of interest used to aggregrate the WF, PPT, and CWU calculations performed in steps 8-10. Here, the relation is the attribute table from a shapefile of California county political boundaries. Attributes include: county index, name, and corresponding DWR hydrologic region.
ca.counties contains the shapefile, on which roi.table is based.
ca.hr contains the shapefile of DWR hydrologic regions, for California.
wf.master.wyear contains annual sums of cwr, ppt, etc, wf, and harvested tons/acres/USD aggregrated by crop, county, and water year. It only includes the counties that have corresponding entries in the County Ag Comissioners (CAC) annual yield reports. Counties that had modeled ag production, without an accompanying entry in the CAC annual reports were omitted.
wf.master.cyear contains the above, except aggregrated by calendar year (instead of water year). This data set includes the extra year of 2007.
yield.master contains all of the statistics reported by the CAC annual reports (production tons, harvested acres, value in USD), for the modeled crops.
Typially in a boxplot, outliers are marked as values that fall outside of 1.5 times the interquartile range avove and below the 25% and 75% quantiles. is_extreme() is a function that we define below, that applys the same criteria and returns TRUE if the value is a suspected outlier/extreme falue, and false otherwise.
is_extreme <- function(series) {
#' Tags values in a series according to whether or not they would appear as oulier points
#` in a Tukey boxplot (values that lie outside of the H-spread.)
#` Specifically, these are values that are less or greater than the 1st and 3rd quartiles
#` plus 1.5 times the interquartile range.
#' @param series Vector of type numeric (int or double)
#' @return Vector of type logical, TRUE if the value is an outlier, FALSE if it is not
return(series < (quantile(series, 0.25) - (1.5 * IQR(series)) ) | series > (quantile(series, 0.75) + (1.5 * IQR(series)) ))
}
Here, we clean obviously erronious observations.
I think the 2009 Riverside CAC survey severly under-reports the “Oats” category to the point that it makes the “Oats” water footprint blow up (several orders of magnitude higher than baseline, see the by-year summary notebook).
wf.master.wyear <- wf.master.wyear %>%
filter(!(`cdl.name` == "Oats" & `County` == "Riverside" & `Year` == 2009))
Here, we sum all daily volumes of cwr, ppt, et, and wf by county, over each calendar and water year.
wf.master.bycounty.cyear <- wf.master.cyear %>%
mutate(Year = as.Date(sprintf("%s-01-01", Year), format="%F")) %>%
group_by(roi.index, County, Year) %>%
summarize_at(vars(cwr, ppt, et.b, et.g, wf.b, wf.g), funs(sum(., na.rm = TRUE))) %>%
left_join(roi.table %>% select(NUM,`HR_NAME`),
by = c("roi.index" = "NUM")) %>%
ungroup() %>% # I'm not sure what to think about the expicit `ungroup()`
`attr<-`("year_type", "calendar") %>%
`attr<-`("summary_type", "total")
wf.master.bycounty.wyear <- wf.master.wyear %>%
mutate(Year = as.Date(sprintf("%s-01-01", Year), format="%F")) %>%
group_by(roi.index, County, Year) %>%
summarize_at(vars(cwr, ppt, et.b, et.g, wf.b, wf.g), funs(sum(., na.rm = TRUE))) %>%
left_join(roi.table %>% select(NUM,`HR_NAME`),
by = c("roi.index" = "NUM")) %>%
ungroup() %>% # I'm not sure what to think about the expicit `ungroup()`
`attr<-`("year_type", "calendar") %>%
`attr<-`("summary_type", "total")
wf.master.byHR.wyear <- wf.master.wyear %>%
left_join(roi.table %>%
select(NUM,`HR_NAME`),
by = c("roi.index" = "NUM")) %>%
mutate(Year = as.Date(sprintf("%s-01-01", Year), format="%F")) %>%
group_by(`HR_NAME`, Year) %>%
summarize_at(vars(cwr, ppt, et.b, et.g, wf.b, wf.g), funs(sum(., na.rm = TRUE))) %>%
ungroup() %>%
`attr<-`("year_type", "water") %>%
`attr<-`("summary_type", "total") %>%
`attr<-`("summarized_by", "DWR-HR")
We have imported the spatial layers as simple features, implemented within tidy dataframes. Thus, we can join other attributes to each spatial feature, provided that the indicies align. Here, we are joining the county-wise annual statistics, also visualized in notebook 2, 5.yr_avg_cnty.Rmd.
For California counties, we join the average yearly totals (from WY 08-15) to each county polygon.
For DWR Hydrologic Regions, we total the average wf/cwu/ppt of all of the counties that lie within a particular hydrological region. The relation between counties -> hydrologic regions was done earlier with a spatial join based on county centroids in GDAL+QGIS. That is, a county was classed into a particular hydrological region depending on which region its centroid resided.
Note: We use an inner join here to prevent the introduction of NAs from counties that are present in our shapefile, but not in the agricultural observations. Specifically, “San Francisco” is the county that never had any observed (read: modeled) agricultural production.
TODO: Here, we are joining multiple years worth of data to a sf data frame. The resultant data frame will have multiple rows describing ‘observations’ for the same geographic region–an observation for each year. This means that spatial geometry attributes are repeated for each yearly observation, multipling the size of the original spatial data frame by the number of repeated observations (in this case, a factor of 8, for 8 years). There may be a more efficient way to do this join that does not result in repeated spatial data, but there may not. ¯_(ツ)_/¯
ca.counties.wy <- ca.counties %>%
inner_join(wf.master.bycounty.wyear %>%
select(roi.index, Year, cwr, ppt, et.b, et.g, wf.b, wf.g),
by = c("NUM" = "roi.index"))
ca.hr.wy <- ca.hr %>%
inner_join(wf.master.bycounty.wyear %>%
group_by(`HR_NAME`, Year) %>%
summarize_at(vars(cwr, ppt, et.b, et.g, wf.b, wf.g), funs(sum(., na.rm = TRUE))),
by = c("HR_NAME" = "HR_NAME"))
Tallies within each regon represent the sum of all of observations within a given water year.
ca.counties.wy %>%
ggplot() +
geom_sf(mapping = aes(fill = cwr), color = "white") +
facet_wrap(~Year, ncol = 3, dir = "h") +
scale_fill_viridis(name = "cwr (m3)", trans = "log", option = "viridis",
labels = function(x) format(signif(x,1)/1, scientific = TRUE)) +
labs(title = "Annual CWR by county for 2008 - 2015 water years",
subtitle = "log10 scale")
ca.counties.wy %>%
ggplot() +
geom_sf(mapping = aes(fill = ppt), color = "white") +
facet_wrap(~Year, ncol = 3, dir = "h") +
scale_fill_distiller(name = "ppt (m3)", trans = "log", type = "seq", palette = "Purples",
labels = function(x) format(signif(x,1)/1, scientific = TRUE)) +
labs(title = "Annual PPT by county for 2008 - 2015 water years",
subtitle = "log10 scale")
ca.counties.wy %>%
ggplot() +
geom_sf(mapping = aes(fill = wf.b), color = "blue") +
facet_wrap(~Year, ncol = 3, dir = "h") +
scale_fill_viridis(name = "wf.b (m3/tonne)", trans = "log", option = "magma",
labels = function(x) format(signif(x,1)/1, scientific = TRUE)) +
labs(title = "Annual Blue WF by county for 2008 - 2015 water years",
subtitle = "log10 scale")
ca.counties.wy %>%
ggplot() +
geom_sf(mapping = aes(fill = wf.g), color = "blue") +
facet_wrap(~Year, ncol = 3, dir = "h") +
scale_fill_viridis(name = "wf.b (m3/tonne)", trans = "log", option = "magma",
labels = function(x) format(signif(x,1)/1, scientific = TRUE)) +
labs(title = "Annual Green WF by county for 2008 - 2015 water years",
subtitle = "log10 scale")
Where HR stands for Hydrologic Region. Tallies within each regon represent the sum of all of the counties whose centroids lie within each region.
ca.hr.wy %>%
ggplot() +
geom_sf(mapping = aes(fill = cwr), color = "white") +
facet_wrap(~Year, ncol = 3, dir = "h") +
scale_fill_viridis(name = "cwr (m3)", trans = "log", option = "viridis",
labels = function(x) format(signif(x,1)/1, scientific = TRUE)) +
labs(title = "Annual CWR by county for 2008 - 2015 water years",
subtitle = "log10 scale")
ca.hr.wy %>%
ggplot() +
geom_sf(mapping = aes(fill = ppt), color = "white") +
facet_wrap(~Year, ncol = 3, dir = "h") +
scale_fill_distiller(name = "ppt (m3)", trans = "log", type = "seq", palette = "Purples",
labels = function(x) format(signif(x,1)/1, scientific = TRUE)) +
labs(title = "Annual PPT by county for 2008 - 2015 water years",
subtitle = "log10 scale")
ca.hr.wy %>%
ggplot() +
geom_sf(mapping = aes(fill = wf.b), color = "blue") +
facet_wrap(~Year, ncol = 3, dir = "h") +
scale_fill_viridis(name = "wf.b (m3/tonne)", trans = "log", option = "magma",
labels = function(x) format(signif(x,1)/1, scientific = TRUE)) +
labs(title = "Annual Blue WF by county for 2008 - 2015 water years",
subtitle = "log10 scale")
ca.hr.wy %>%
ggplot() +
geom_sf(mapping = aes(fill = wf.g), color = "blue") +
facet_wrap(~Year, ncol = 3, dir = "h") +
scale_fill_viridis(name = "wf.b (m3/tonne)", trans = "log", option = "magma",
labels = function(x) format(signif(x,1)/1, scientific = TRUE)) +
labs(title = "Annual Green WF by county for 2008 - 2015 water years",
subtitle = "log10 scale")